Skip to content

Conversation

@bouweandela
Copy link
Member

@bouweandela bouweandela commented Jul 3, 2025

Description

Add an interface for adding new data sources. Documentation of the new interface is available here: esmvalcore.io.

The existing esmvalcore.local and esmvalcore.esgf modules have been modified to make use of the new interface and as an example use case, support for using intake-esgf to find input data has been added.

Several commands have been added:

  • esmvaltool config show: print the current configuration
  • esmvaltool config list: list available example configuration files
  • esmvaltool config copy: copy an example configuration file to your configuration directory, i.e. ~/.config/esmvaltool or the path defined by the ESMVALTOOL_CONFIG_DIR environment variable.

To try the new intake-esgf data source, configure esmvaltool to use it by running the command esmvalcore config copy intake-esgf-data.yml.

Related to #2584

Contains changes to esmvalcore.local.DataSource that are not backwards compatible.

Link to documentation:

Follow up ideas:

  • Add descriptions to the example configuration files for displaying in the command esmvaltool config list
  • Improve validation of the data source configuration
  • Move the modules esmvalcore.esgf and esmvalcore.local into esmvalcore.io. To avoid introducing even more changes in the pull request, I will do this in a follow up pull request.
  • Make the fixes module configurable per data source
  • Add a site configuration setting that selects defaults appropriate to that site, e.g. site: levante would select data sources and dask settings appropriate to Levante, site: jasmin for Jasmin, to simplify configuration of the tool Add a site option to the get_config_user command #1706

Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@valeriupredoi
Copy link
Contributor

I'll work with you on this one @bouweandela 🍺

@bouweandela bouweandela force-pushed the add-intake-esgf-support branch from e91e383 to 9d67ed5 Compare July 22, 2025 13:56
@bouweandela bouweandela added the enhancement New feature or request label Jul 23, 2025
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a dive in this, bud - let me know how I can help!

f"but your configuration for project '{project}' contains "
f"'{data_source}' of type '{type(data_source)}'."
)
raise TypeError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to see if we can first convert it to a DataSource before we toss it out the window

-------
:obj:`typing.Iterable` of :obj:`esmvalcore.io.base.DataElement`
The data elements that have been found.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an excellent addition - we are finally abstracting a data object that gets ingested by esmvalcore, and we generalize it: let's be careful how we implement this so it can be reused with little fuss for the future: I'd argue that "data that can be loaded" can be anything ie the most generic file object (not needing to be on disk, nor it needing it to be downloaded), so we can operate with object stores too

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jul 24, 2025

this one here ties in very well with this PR, bud #2785 - enjoy your time off 🏖️

@valeriupredoi
Copy link
Contributor

hey @bouweandela hope you're enjoying your holiday time! I kept myself busy and we now have Zarr support (in _io.load) and have done other improvements, hence the conflicts with main, let me fix those for you now. Alas, you can now pass an Intake catalog via this PR, and if that has Zarr files in S3 buckets, then we can load them and test this one 😃

@codecov
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 93.78428% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.37%. Comparing base (459587d) to head (5fbe25c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
esmvalcore/_main.py 20.00% 28 Missing ⚠️
esmvalcore/local.py 96.34% 3 Missing ⚠️
esmvalcore/config/_data_sources.py 92.30% 2 Missing ⚠️
esmvalcore/io/intake_esgf.py 98.95% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2765      +/-   ##
==========================================
- Coverage   95.46%   95.37%   -0.09%     
==========================================
  Files         260      264       +4     
  Lines       15519    15862     +343     
==========================================
+ Hits        14815    15129     +314     
- Misses        704      733      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bouweandela bouweandela force-pushed the add-intake-esgf-support branch 2 times, most recently from 3bf06ad to ef2e7cd Compare September 17, 2025 09:15
@bouweandela bouweandela added this to the v2.14.0 milestone Oct 3, 2025
@bouweandela bouweandela force-pushed the add-intake-esgf-support branch 4 times, most recently from bea9cf8 to bce7c5a Compare October 17, 2025 10:26
Move timerange extraction to DataElement

Move tests/unit/test_provenance.py to tests/unit/provenance and add more tests
@bouweandela bouweandela force-pushed the add-intake-esgf-support branch from 0b12c7b to 1794742 Compare October 17, 2025 14:36
@bouweandela bouweandela force-pushed the add-intake-esgf-support branch from ca867c6 to 94287ab Compare October 22, 2025 16:00
@bouweandela bouweandela changed the title Add support for intake-esgf Add an interface for adding new data sources and add support for intake-esgf as a first example Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants